Missing data, incomplete taxa, and phylogenetic accuracy.

نویسنده

  • John J Wiens
چکیده

The problem of missing data is often considered to be the most important obstacle in reconstructing the phylogeny of fossil taxa and in combining data from diverse characters and taxa for phylogenetic analysis. Empirical and theoretical studies show that including highly incomplete taxa can lead to multiple equally parsimonious trees, poorly resolved consensus trees, and decreased phylogenetic accuracy. However, the mechanisms that cause incomplete taxa to be problematic have remained unclear. It has been widely assumed that incomplete taxa are problematic because of the proportion or amount of missing data that they bear. In this study, I use simulations to show that the reduced accuracy associated with including incomplete taxa is caused by these taxa bearing too few complete characters rather than too many missing data cells. This seemingly subtle distinction has a number of important implications. First, the so-called missing data problem for incomplete taxa is, paradoxically, not directly related to their amount or proportion of missing data. Thus, the level of completeness alone should not guide the exclusion of taxa (contrary to common practice), and these results may explain why empirical studies have sometimes found little relationship between the completeness of a taxon and its impact on an analysis. These results also (1) suggest a more effective strategy for dealing with incomplete taxa, (2) call into question a justification of the controversial phylogenetic supertree approach, and (3) show the potential for the accurate phylogenetic placement of highly incomplete taxa, both when combining diverse data sets and when analyzing relationships of fossil taxa.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The use and validity of composite taxa in phylogenetic analysis.

In phylogenetic analysis, one possible approach to minimize missing data in DNA supermatrices consists in sampling sequences from different species to obtain a complete sequence for all genes included in the study. We refer to those complete sequences as composite taxa because DNA sequences that are combined belong to different species. An alternative approach is to analyze incomplete supermatr...

متن کامل

Does adding characters with missing data increase or decrease phylogenetic accuracy?

Missing data are a widely recognized nuisance factor in phylogenetic analyses, and the fear of missing data may deter systematists from including characters that are highly incomplete. In this paper, I used simulations to explore the consequences of including sets of characters that contain missing data. More specifically, I tested whether the benefits of increasing the number of characters out...

متن کامل

Missing data and the accuracy of Bayesian phylogenetics

The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters...

متن کامل

Missing data and the design of phylogenetic analyses

Concerns about the deleterious effects of missing data may often determine which characters and taxa are included in phylogenetic analyses. For example, researchers may exclude taxa lacking data for some genes or exclude a gene lacking data in some taxa. Yet, there may be very little evidence to support these decisions. In this paper, I review the effects of missing data on phylogenetic analyse...

متن کامل

Highly Incomplete Taxa Can Rescue Phylogenetic Analyses from the Negative Impacts of Limited Taxon Sampling

BACKGROUND Phylogenies are essential to many areas of biology, but phylogenetic methods may give incorrect estimates under some conditions. A potentially common scenario of this type is when few taxa are sampled and terminal branches for the sampled taxa are relatively long. However, the best solution in such cases (i.e., sampling more taxa versus more characters) has been highly controversial....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Systematic biology

دوره 52 4  شماره 

صفحات  -

تاریخ انتشار 2003